Equity Pairs Selection in the Russell 2000

A Stability-Focused, Industry-Constrained Statistical Arbitrage Pipeline

Author

Zach Quicksall

Published

February 15, 2026

1 Motivation and Objective

Pairs trading strategies rely on identifying asset pairs whose relative price dynamics are stable and mean-reverting. In practice, many statistically appealing pairs fail out-of-sample due to regime shifts, structural breaks, or spurious correlations.

The objective of this analysis is to construct a robust pair selection pipeline for U.S. equities that:

  • Emphasizes economic coherence via industry constraints
  • Separates candidate generation from statistical validation
  • Explicitly evaluates out-of-sample stability
  • Prioritizes tradability, not just statistical significance

The focus is deliberately on pair selection, rather than signal generation, execution, or portfolio construction.

2 Data and Universe Construction

2.1 Equity Universe

The starting universe consists of equities from the Russell 2000, representing small-capitalization U.S. stocks.

To improve robustness and realism, the universe is filtered to remove:

  • Securities with insufficient price history
  • Illiquid or irregularly traded names
  • Symbols with missing or inconsistent adjusted prices

All prices are transformed to log adjusted close prices to stabilize variance and allow linear modeling.

2.2 Industry Classification

Each security is assigned to an industry classification (e.g., GICS). All pair construction and evaluation is performed within industry groups only. This constraint:

  • Eliminates spurious cross-industry relationships
  • Enforces economic interpretability
  • Aligns with how relative-value equity strategies are typically deployed

3 Train/test Split

To explicitly evaluate stability, the sample is split into:

  • Training period: used for estimation and selection
  • Test period (~5 years): used only for validation

All model estimation (hedge ratios, cointegration tests, half-life estimation) is performed only on the training window unless explicitly stated otherwise. No information from the test window is used during candidate generation.

4 Candidate Pair Generation

4.1 Return Correlation as Pre-Screen

Candidate pairs are generated using return correlations, computed over the training period. Return correlation is used only to reduce the combinatorial search space, not as evidence of mean reversion.

For each stock:

  • Correlations are computed against peers within the same industry
  • Top-N most correlated neighbors are retained

This produces a tractable, economically coherent candidate set while avoiding arbitrary clustering assumptions.

4.2 Refinement of Pairs

After de-duplication of symmetric pairs, the result is a long-form table of candidate pairs which serves as the baseline pair set to all subsequent statistical analysis.

5 Pair Diagnostics and Metrics

Each candidate pair is evaluated using a consistent set of diagnostics designed to capture long-run relationship strength, mean-reversion dynamics, and practical tradability. Metrics are computed primarily on the training window, with selected diagnostics recomputed out-of-sample for validation.

5.1 Hedge Ratio Estimation

For each candidate pair, a hedge ratio is estimated using ordinary least squares on log prices over the training window:

\[ \log P_x = \alpha + \beta \log P_y + \varepsilon_t \]

For each candidate pair, a hedge ratio is estimated using ordinary least squares on log prices over the training window:

\[ s_t = \log P_x - \left(\alpha + \beta \log P_y\right) \]

This spread represents the relative mispricing between the two securities under the assumed linear relationship.

In addition to the hedge ratio itself, several fit-quality diagnostics are retained, including the regression R^2 and residual volatility. These metrics help distinguish tightly linked pairs from relationships that are statistically significant but economically loose.

5.2 Cointegration Testing

Evidence of a stable long-run equilibrium relationship is assessed using an Augmented Dickey–Fuller (ADF) test applied to the training-period spread.

\[ \Delta s_t = \gamma s_{t-1} + \sum_{i=1}^{k} \phi_i \Delta s_{t-i} + \epsilon_t \]

\[ \gamma < 0 \]

The ADF test evaluates whether the spread is stationary, providing a necessary (but not sufficient) condition for mean reversion. For each pair, both the ADF test statistic and p-value are recorded.

Cointegration is treated as an initial screening signal, not a final decision rule. Many cointegrated pairs exhibit impractically slow or unstable dynamics and are filtered later through additional diagnostics.

5.3 Mean Reversion Speed

To quantify the speed of mean reversion, the training-period spread is approximated using an AR(1) process:

\[ s_t = \rho s_{t-1} + \epsilon_t \]

The implied half-life of mean reversion is computed as:

\[ \text{Half-life} = -\frac{\log 2}{\log \rho} \]

Half-life provides a direct, interpretable measure of how quickly deviations from equilibrium decay. Pairs with extremely long half-lives are penalized, as they imply slow convergence and extended holding periods, while extremely short half-lives are treated cautiously as potential noise-driven artifacts.

5.4 Tradability Proxies

Statistical validity alone is insufficient for practical deployment. Several additional diagnostics are included to proxy real trading behavior:

  • Spread volatility (standard deviation and interquartile range)
  • Zero-crossing frequency, measuring how often the spread crosses its mean
  • Excursion frequency, defined as the proportion of time the spread deviates beyond ±2 standard deviations

These measures help identify spreads that are both active and stable, filtering out relationships that are statistically stationary but rarely generate actionable deviations.

6 Out-of-Sample Validation

To assess robustness, key diagnostics are recomputed on a held-out five-year test window, including:

  • Cointegration p-values
  • Mean-reversion half-life

Rather than enforcing strict pass/fail rules, out-of-sample behavior is incorporated as a stability signal. Pairs that retain cointegration and similar mean-reversion characteristics out-of-sample are rewarded, while pairs exhibiting large regime-dependent shifts are penalized.

This approach balances robustness with flexibility, avoiding excessive reliance on any single test while still discouraging overfit relationships.

7 Pair Ranking Framework

Candidate pairs are ranked using a composite scoring framework that combines:

  • Training-period cointegration strength
  • Out-of-sample cointegration confirmation
  • Mean-reversion speed
  • Stability of dynamics across regimes
  • Tradability proxies such as zero-crossings and spread excursions

Each component is normalized within the candidate set and combined using a weighted average. Hard thresholds are applied sparingly to remove clearly unsuitable pairs, while most decisions are handled through continuous scoring to avoid brittle cutoff effects.

The resulting ranked list emphasizes stability, interpretability, and practical relevance rather than in-sample optimization.

x y coint_pvalue half_life zero_cross pct_outside_2sd score
FELE SPXC 0.0100000 22.13237 0.0806324 0.0663507 0.9268802
WSFS ABCB 0.0100000 18.60860 0.0735178 0.0663507 0.9217349
FIBK ONB 0.0100000 21.60553 0.0798419 0.0545024 0.9182047
TCBI ABCB 0.0100000 24.40199 0.0814229 0.0655608 0.9173677
WSFS ASB 0.0100000 23.98015 0.0814229 0.0568720 0.9064063
UCB VLY 0.0100000 19.54379 0.0640316 0.0631912 0.9049587
WSFS TCBI 0.0100000 18.79628 0.0687747 0.0552923 0.8960633
WSFS VLY 0.0100000 23.79069 0.0743083 0.0537125 0.8863476
WSFS HWC 0.0100000 19.16024 0.0545455 0.0655608 0.8857265
AUB VLY 0.0100000 24.43359 0.0861660 0.0339652 0.8837480
RNST TCBI 0.0100000 14.75160 0.0845850 0.0323855 0.8673205
HRI NPO 0.0100000 20.86016 0.0632411 0.0497630 0.8602992
FELE ESE 0.0100000 23.91527 0.0798419 0.0236967 0.8587033
IBOC ONB 0.0100000 14.34639 0.0830040 0.0268562 0.8519237
RNST ABCB 0.0100000 25.05281 0.0608696 0.0458136 0.8465640
NWE POR 0.0100000 18.68287 0.0474308 0.0466035 0.8412922
FULT HWC 0.0100000 20.75629 0.0569170 0.0229068 0.8381763
RDN ESNT 0.0100000 23.62156 0.0545455 0.0426540 0.8371403
CATY HWC 0.0100000 19.69836 0.0490119 0.0308057 0.8365984
HP RIG 0.0100000 23.56871 0.0474308 0.0497630 0.8339788
FIBK IBOC 0.0100000 25.41529 0.0577075 0.0473934 0.8331478
IBOC AUB 0.0100000 26.22029 0.0513834 0.0639810 0.8325338
WSFS UMBF 0.0100000 25.87500 0.0513834 0.0576619 0.8296991
SLG MAC 0.0121665 25.49511 0.0561265 0.0758294 0.8294071
BKH POR 0.0100000 24.61036 0.0577075 0.0379147 0.8236308
ASB HWC 0.0100000 28.48705 0.0545455 0.0505529 0.8219903
RNST WSFS 0.0100000 26.04240 0.0577075 0.0363349 0.8199065
UCB AUB 0.0100000 22.77765 0.0498024 0.0252765 0.8141762
WHD AROC 0.0100000 12.88370 0.0788043 0.0217096 0.8102753
HWC VLY 0.0100000 26.72348 0.0442688 0.0323855 0.7816520
ACA ESE 0.0100000 13.44216 0.0619308 0.0236364 0.7810495
HP MUR 0.0100050 36.30073 0.0513834 0.0576619 0.7785563
FIBK PIPR 0.0100000 29.17451 0.0521739 0.0331754 0.7747629
CATY WSFS 0.0100000 27.39795 0.0458498 0.0252765 0.7741406
MHO IBP 0.0100000 33.96137 0.0608696 0.0308057 0.7669593
WHD LBRT 0.0155545 19.78825 0.0692935 0.0257802 0.7656472
FIBK AX 0.0100000 32.59884 0.0474308 0.0513428 0.7640921
WHD MUR 0.0190183 20.31691 0.0638587 0.0502035 0.7622161
CDP EPRT 0.0182614 20.65499 0.0947205 0.0465116 0.7586902
IBOC UCB 0.0100000 25.92547 0.0332016 0.0331754 0.7574958
VLY UMBF 0.0122166 31.77519 0.0498024 0.0560821 0.7564009
NHI RHP 0.0100000 28.72155 0.0490119 0.0410742 0.7541022
IBOC INDB 0.0100000 28.79464 0.0387352 0.0308057 0.7472181
RNST ASB 0.0142920 29.73577 0.0553360 0.0410742 0.7395061
OUT SBRA 0.0100000 29.34784 0.0513834 0.0268562 0.7365596
NWE BKH 0.0176518 28.37077 0.0616601 0.0568720 0.7351792
CDP SBRA 0.0100000 37.09768 0.0632411 0.0458136 0.7344280
ABCB UMBF 0.0100000 37.02480 0.0403162 0.0442338 0.7302945
FELE WTS 0.0179630 25.40347 0.0577075 0.0442338 0.7290248
HP WHD 0.0100000 34.75352 0.0421196 0.0461330 0.7280741
IBOC AX 0.0100000 31.90460 0.0363636 0.0363349 0.7223014
ABCB VLY 0.0100000 31.28010 0.0324111 0.0221169 0.7199872
MUR RIG 0.0173514 33.94921 0.0671937 0.0568720 0.7150243
SR NJR 0.0109400 42.63456 0.0505929 0.0521327 0.7070444
ABCB HWC 0.0253757 29.73378 0.0695652 0.0655608 0.7007080
UCB UMBF 0.0100000 41.62628 0.0411067 0.0292259 0.6989972
NHI CTRE 0.0100000 36.19026 0.0347826 0.0442338 0.6950458
RNST HWC 0.0130178 37.97848 0.0569170 0.0252765 0.6937561
IBOC VLY 0.0100000 48.24171 0.0387352 0.0568720 0.6905040
TMHC IBP 0.0126774 39.01908 0.0592885 0.0363349 0.6891775
TCBI UMBF 0.0100000 36.47614 0.0229249 0.0379147 0.6858607
NHI IRT 0.0100000 31.43695 0.0221344 0.0513428 0.6834500
OGS SR 0.0100000 60.76330 0.0434783 0.0631912 0.6826635
ZWS WTS 0.0196690 26.42704 0.0537549 0.0363349 0.6813396
AUB HWC 0.0100000 45.68916 0.0434783 0.0221169 0.6793768
FELE ZWS 0.0167806 28.76564 0.0395257 0.0481833 0.6783621
OUT NHI 0.0100000 32.49217 0.0308300 0.0244866 0.6781651
IBOC ENVA 0.0194988 33.46433 0.0624506 0.0560821 0.6780390
CATY ASB 0.0100000 38.17759 0.0276680 0.0134281 0.6720211
NHI SBRA 0.0100000 47.80517 0.0387352 0.0505529 0.6702805
CATY AUB 0.0206965 36.99237 0.0671937 0.0434439 0.6645615
BKU WSFS 0.0310750 32.26862 0.0664032 0.0868878 0.6531103
WSFS AUB 0.0100000 42.87030 0.0245059 0.0221169 0.6510033
BGC CATY 0.0100000 50.73482 0.0260870 0.0315956 0.6276560
RNST VLY 0.0109103 62.44461 0.0308300 0.0489731 0.6175357
RNST UMBF 0.0100000 52.57616 0.0245059 0.0276461 0.6158717
INDB ONB 0.0262279 43.31820 0.0561265 0.0710900 0.6074868
UCB HWC 0.0100000 60.57633 0.0245059 0.0292259 0.6050154
IBOC HWC 0.0100000 67.32100 0.0229249 0.0450237 0.5861110
IBOC UMBF 0.0100000 59.51834 0.0284585 0.0071090 0.5764147
AX UMBF 0.0162909 49.75532 0.0521739 0.0197472 0.5743123
UCB ABCB 0.0100000 71.37739 0.0371542 0.0094787 0.5740845
IBOC ASB 0.0100000 87.77295 0.0371542 0.0505529 0.5669575
OGS POR 0.0244528 40.17737 0.0403162 0.0402844 0.5587288
WSFS IBOC 0.0147520 78.78158 0.0292490 0.0837283 0.5516586
MWA ENS 0.0218357 35.78825 0.0434783 0.0371248 0.5487730
IBOC ABCB 0.0100000 73.11269 0.0260870 0.0300158 0.5453746
IBOC FULT 0.0100000 87.65794 0.0276680 0.0442338 0.5427783
FLG FFIN 0.0385546 28.28036 0.0758893 0.0497630 0.5340379
WSFS UCB 0.0170780 60.23634 0.0324111 0.0355450 0.5330286
IBOC TCBI 0.0100000 71.28163 0.0229249 0.0284360 0.5276707
AUB UMBF 0.0296066 41.92046 0.0403162 0.0466035 0.5273374
IBOC SFBS 0.0204414 45.20563 0.0276680 0.0323855 0.5230209
BGC ASB 0.0329680 42.67259 0.0498024 0.0450237 0.5202895
ASB TCBI 0.0431299 29.13923 0.0474308 0.0292259 0.5191872
HWC UMBF 0.0400135 39.94392 0.0434783 0.0671406 0.5176086
CATY TCBI 0.0247289 40.61629 0.0276680 0.0331754 0.5144050
CDP IRT 0.0240036 46.32832 0.0584980 0.0513428 0.5075988
INDB GBCI 0.0469137 38.60072 0.0521739 0.0529226 0.5005500
BGC HWC 0.0232832 61.60283 0.0387352 0.0363349 0.4898185
UCB TCBI 0.0175491 78.64111 0.0371542 0.0371248 0.4773141
UCB AX 0.0372862 40.28011 0.0395257 0.0387046 0.4697040
OTTR NJR 0.0242418 70.84344 0.0371542 0.0663507 0.4658995
BGC ABCB 0.0380811 53.19447 0.0513834 0.0410742 0.4620384
TCBI VLY 0.0304324 53.55636 0.0308300 0.0513428 0.4618092
SR POR 0.0400616 32.64992 0.0284585 0.0229068 0.4617824
FELE UFPI 0.0440843 45.46943 0.0466403 0.0466035 0.4481871
BGC VLY 0.0229153 68.43908 0.0276680 0.0418641 0.4424950
ACA WTS 0.0852277 22.26249 0.0637523 0.0163636 0.4361188
CATY VLY 0.0766411 34.73599 0.0561265 0.0505529 0.4357837
ASB ABCB 0.0920708 38.04586 0.0679842 0.0639810 0.4330473
SLG SKT 0.0637475 34.25543 0.0347826 0.0616114 0.4317217
SBRA RHP 0.0473032 41.55675 0.0505929 0.0537125 0.4292167
FIBK AUB 0.0303268 72.34237 0.0308300 0.0560821 0.4283130
AUB AX 0.0410034 39.85122 0.0363636 0.0213270 0.4249584
FULT ASB 0.0540782 45.50663 0.0474308 0.0355450 0.4146750
CDP OUT 0.0411972 42.84733 0.0466403 0.0497630 0.4145982
WSFS AX 0.0353458 67.15111 0.0505929 0.0410742 0.4128501
CDP CTRE 0.0280015 49.94401 0.0379447 0.0347551 0.4127500
ONB UMBF 0.0189023 92.54043 0.0347826 0.0063191 0.4086798
CBU FFIN 0.0763476 41.30273 0.0521739 0.0442338 0.3954279
RNST AX 0.0273321 90.24370 0.0434783 0.0481833 0.3890620
IBOC PIPR 0.0496822 48.67566 0.0363636 0.0481833 0.3866275
RDN VLY 0.0281881 60.20865 0.0292490 0.0205371 0.3828196
OGS TXNM 0.0999563 46.38068 0.0695652 0.0592417 0.3811694
CBU FBP 0.0558204 51.24446 0.0403162 0.0458136 0.3792402
AEO URBN 0.0648587 53.96487 0.0521739 0.0576619 0.3764495
CATY ABCB 0.0610190 37.92438 0.0324111 0.0229068 0.3755930
KBH IBP 0.0581278 45.07792 0.0426877 0.0189573 0.3727366
SWX POR 0.0639836 45.26568 0.0411067 0.0592417 0.3692214
FULT TCBI 0.0705543 39.90537 0.0379447 0.0418641 0.3629303
IBOC GBCI 0.0400213 64.97056 0.0205534 0.0481833 0.3524255
PIPR ONB 0.0616727 61.45374 0.0466403 0.0545024 0.3457987
MWA NPO 0.0514109 38.41018 0.0371542 0.0221169 0.3455351
AUB ONB 0.0964546 50.09245 0.0490119 0.0703002 0.3433461
GPI ABG 0.0944825 42.46309 0.0553360 0.0410742 0.3422301
TCBI HWC 0.0534527 53.65439 0.0213439 0.0355450 0.3416343
FIBK TCBI 0.0220729 98.43102 0.0213439 0.0355450 0.3301580
OUT RHP 0.0807783 40.45557 0.0513834 0.0363349 0.3278605
ASB UMBF 0.0758241 71.50611 0.0482213 0.0703002 0.3213590
FULT VLY 0.0699648 57.73031 0.0316206 0.0497630 0.3181433
NWE SR 0.0930485 41.13884 0.0363636 0.0458136 0.3154110
HOMB AX 0.0302993 68.24230 0.0205534 0.0276461 0.3006546
MHO PATK 0.0552816 75.12931 0.0395257 0.0505529 0.2887243
BKU RNST 0.0984566 40.08376 0.0284585 0.0331754 0.2811918
OTTR SWX 0.0476525 106.34067 0.0395257 0.0521327 0.2803033
CDP NHI 0.0887642 57.26630 0.0521739 0.0600316 0.2682673
SR BKH 0.0773060 54.58750 0.0260870 0.0339652 0.2551352
BKU AX 0.0942112 62.20448 0.0411067 0.0624013 0.2391245
FLG UBSI 0.0924003 57.63963 0.0577075 0.0402844 0.2344719
AX VLY 0.0965462 59.86371 0.0371542 0.0126382 0.1976889
AVA OTTR 0.0772125 62.24236 0.0268775 0.0086888 0.1809389
SBRA CTRE 0.0909055 57.54684 0.0268775 0.0347551 0.1585796
RDN ABCB 0.0730927 87.20046 0.0292490 0.0134281 0.1224459
RDN ASB 0.0696406 91.38151 0.0276680 0.0134281 0.1182564

8 Example Pairs

8.1 NWE vs. POR

8.2 MHO vs. IBP

8.3 SLG vs. MAC

9 Summary and Next Steps

This analysis presents a disciplined approach to equity pair selection that prioritizes robustness over complexity. By combining industry constraints, lightweight pre-screening, rigorous statistical diagnostics, and explicit out-of-sample validation, the resulting pairs are better aligned with real-world statistical arbitrage constraints.

Future extensions could include rolling-window persistence metrics, hedge ratio stability diagnostics, or integration with execution-aware backtesting frameworks.

10 Limitations

This work focuses intentionally on pair selection rather than full strategy backtesting or deployment. Several important considerations are therefore out of scope:

  • Transaction costs and market impact: The analysis does not model bid–ask spreads, slippage, or borrow costs, all of which can materially affect realized performance for small-cap equities.

  • Execution and signal design: While spreads and z-scores are computed for diagnostics, the report does not define entry/exit rules, position sizing, or portfolio constraints.

  • Structural breaks: Cointegration and half-life are evaluated on fixed train/test windows, but real-world relationships may shift within regimes. A rolling persistence study would provide a stronger view of temporal stability.

  • Survivorship and data quality risks: Results depend on the completeness and correctness of the Russell 2000 membership and price data. Corporate actions, symbol changes, and missing observations can bias estimates if not carefully controlled.

  • Multiple hypothesis testing: Screening many candidate pairs increases the chance of false positives. Out-of-sample evaluation mitigates this risk but does not eliminate it.